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processing  when  attention  has  been  focused  on  a  particular  aspect  of  the  image.  Two  experiments 
investigated  the  ability  of  human  observers  to  detect  and  recognize  simple  objects  in  visual 
images.  Prior  to  presentation,  the  images  were  transformed  by  spatial  frequency  filters  to 
emphasize  the  global-  (low  spatial  frequencies),  local-  (high  spatial  frequencies)  or 
intermediate-  (mid  spatial  frequencies)  scale  structure.  Four  categories  of  top-view  ship  hulls 
were  synthesized  for  the  experiments.  In  the  first  experiment  separate  groups  of  observers  made 
both  detection  (which  quadrant  of  the  display  contained  a  ship?)  and  recognition  (which  of  the 
four  ships  occurred?)  judgments.  In  the  second  experiment,  observers  also  selected  the  filter 
condition  to  be  displayed  on  each  trial  prior  to  the  detection  or  recognition  response. 


The  results  showed  that,  as  expected,  filter  condition  had  a  large  effect  on  recognition 
performance  with  the  unfiltered  images  recognized  better  than  the  high-frequency  images  which  in 
turn  were  recognized  better  than  both  the  mid-  and  low-frequency  images.  As  predicted,  and  in 
contrast  to  the  recognition  data,  the  low-frequency  images  led  to  better  detection  performance 
than  either  the  mid-  or  the  high-  frequency  conditions.  The  low-frequency  images  were  also  more 
easily  detected  than  the  unfiltered  images.  However,  when  permitted  to  select  a  filter  condition 
to  observe  in  the  second  experiment,  observers  did  not  select  the  optimal  unfiltered  image  for 
recognition,  but  selected  the  original  and  the  high-frequency  images  equally  often.  In  contrast, 
for  detection  observers  consistently  selected  the  optimal  low-frequency  images.  The  observer 
selection  results  indicate  that  individuals  are  not  always  able  to  anticipate  the  viewing 
conditions  which  will  lead  to  optimal  perceptual  performance.  The  implications  of  these  results 
for  human/computer  interaction  in  image  processing  are  discussed. 
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It  is  obvious  that  the  physical  world  is  highly  structured  and 
that  this  structure  exists  at  many  levels.  For  example,  the 
surfaces  of  objects  in  an  office  possess  a  gross  or  very  global 
structure  as  in  the  outline  form  of  the  desk,  bookshelves  or 
computer  terminal.  On  the  other  hand,  these  objects  may  also  be 
characterized  in  terms  of  their  more  detailed,  local  structure  as  in 
the  shapes  of  individual  books,  desk  drawers  or  the  terminal 
keyboard.  When  light  is  reflected  from  these  surfaces  to  create  an 
image,  intensity  variations  over  the  two  image  dimensions  capture 
many  aspects  of  this  three-dimensional  structure.  The  most 
fundamental  problem  of  spatial  vision  is  to  understand  the  way  in 
which  this  information  is  used  to  interpret  the  visual  world. 
Although  psychologists  and  others  have  been  interested  in  this  very 
basic  perceptual  problem  for  many  years,  a  full  understanding  has 
remained  elusive. 

Computer  vision  theorists  have  argued  recently  that  to  be 
successful  visual  analyses  must  take  place  across  several  levels  of 
image  scale,  with  each  level  contributing  to  the  overall 
understanding  of  the  objects  in  the  image  (Marr,  1982;  Yuille  & 
Poggio,  1983;  Crowley  &  Sanderson,  1984).  Interestingly, 
considerable  physiological  and  psychophysical  evidence  suggests  that 
the  mammalian  visual  system  may  operate  in  this  fashion  (Sekuler, 
1974).  Specifically,  the  visual  system  contains  a  series  of 
independent  channels  or  analyzers,  each  sensitive  to  image  structure 
at  a  different  scale  (Julesz  &  Schumer,  1981).  These  channels  are 
thought  of  as  broadly-tuned  spatial  frequency  filters  (Julesz, 
1980),  bar  detectors  of  varying  widths  or  sizes  (Nacleod  & 
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Rosenfeld,  1974),  or  zero-crossing  filters  of  different  bandwidths 
(Harr,  1982).  In  this  view,  image  (and  hence  object)  structure  is 
processed  separately  at  different  scales  by  the  various  channels. 
For  example,  the  global  structure  of  an  image  is  extracted 
independently  of  any  local  detail  (or  vice-versa),  and  some  have 
argued  that  this  global  anlysis  may  actually  precede  or  dominate  the 
more  local  analysis  (Hughes,  Layton,  Baird,  &  Lester,  1984). 

With  the  increasing  evidence  for  the  existence  of  these 
channels,  research  has  turned  to  the  question  of  their  role  or 
function  in  vision.  Several  individuals  have  argued  chat  the  broad, 
low-frequency  channels  respond  to  global  or  Gestalt  properties  of  an 
image  and  are  important  in  early  processing — for  instance,  during  an 
initial  glance  at  an  image  (Broadbent,  1977;  Julesz,  1980).  In 
contrast,  the  high-frequency  channels  are  sensitive  to  local  detail 
and  are  important  in  later  visual  processing  when  attention  has  been 
focused  on  a  particular  aspect  of  the  image.  Despite  the  growing 
popularity  of  this  view,  relatively  little  experimental  work  has 
explored  the  implications  of  these  hypothesized  differences  for 
visual  perception.  The  experiments  reported  in  this  paper  address 
this  question  by  investigating  the  ability  of  human  observers  to 
detect  and  recognize  simple  objects  in  visual  images.  Prior  to 
presentation,  the  images  are  transformed  by  spatial  frequency 
filters  to  emphasize  the  global-  (low  spatial  frequency),  local- 
thigh  spatial  frequency)  or  intermediate-  (mid  spatial  frequencies) 
scale  structure.  The  results  support  the  hypothesis  that  spatial 
scale  plays  an  important  role  in  the  detection  and  recognition  of 
objects  in  visual  imagery. 
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Evidence  For  Visual  Channels 

In  1843  Ohm  proposed  that  the  human  auditory  system  can 
decompose  a  complex  sound  into  its  elementary  frequency  components 
(cited  in  Julesz,  1980).  Ohm's  Acoustical  Law — as  this  proposal  is 
known — paved  the  way  for  Helmholtz,  von  Bekesy,  and  ultimately 
Fletcher,  to  develop  a  view  of  the  auditory  system  which  is  based  on 
a  set  of  broadly-tuned  filters,  called  critical  bands,  each  of  which 
responds  to  only  a  subset  of  frequencies  in  the  audible  spectrum. 
These  filters,  or  channels,  form  the  basis  of  much  of  contemporary 
auditory  theory.  The  argument  that  they  exist  is  intuitively 
compelling  since  it  is  common  experience  to  hear  the  tonal 
components  when  listening  to  a  complex  sound  such  as  a  musical 
chord. 

Although  Young  proposed  the  existence  of  separate  channels  for 
color  vision  in  the  early  1800's,  the  analogous  concept  of 
independent  channels  in  human  spatial  vision  is  a  relatively  recent 
proposal.  Campbell  and  Robson  (1968)  were  the  first  to  suggest  that 
vision  may  be  based  on  a  set  of  spatial  frequency  analyzers  each  of 
which  responds  to  only  a  narrow  range  of  spatial  frequencies.  This 
proposal  suggests,  unintuitively ,  that  at  some  level  in  the  visual 
system,  a  complex  pattern  may  be  decomposed  into  a  finite  set  of 
simpler,  periodic  intensity  patterns.  Despite  its  lack  of  intuitive 
appeal,  this  basic  idea  has  gained  wide  acceptance  in  recent  years 
with  significant  support  from  both  physiological  and  psychophysical 
findings  (see  reviews  by  Sekuler,  1974;  DeValois  &  DeValois,  1980; 
Julesz  &  Schumer,  1981). 
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One  implication  of  the  multichannel  model  of  spatial  vision  is 


that  overall  spatial  sensitivity,  as  measured  by  the  modulation 


transfer  function  (MTF)  for  example,  reflects  the  envelope  of  a 


number  of  individual  sensitivity  curves.  Two  basic  questions 


follow.  First,  how  many  individual  channels  exist,  and  second,  what 


is  the  underlying  sensory  mechanism  for  each  channel?  Both 


questions  are  addressed  in  a  model  proposed  by  Wilson  and  Bergen 


(1979).  The  model  proposes  that  four  broadly-tuned,  size-sensitive 


mechanisms  exist  at  each  point  in  the  retina.  Furthermore,  the  size 


of  these  units  increases  linearly  with  eccentricity  on  the  retina. 


and  the  composite  sensitivity  at  any  point  results  from  probability 


summation  across  the  four  units.  The  proposed  units  resemble  the 


on-center,  off-surround  retinal  cells  described  by  Kuffler  (1953), 


with  a  sensitivity  profile  characterized  by  a  difference  of  Gaussian 


distributions — one  narrow  and  positive  (excitatory)  and  the  other 


broad  and  negative  (inhibitory).  Although  more  recent  work  has 


reinterpreted  these  basic  units  to  be  zero-crossing  (Marr,  1982)  or 


other  (e.g.,  Daugman,  1983)  filters  rather  than  size-sensitive 


units,  the  distinction  between  these  interpretations  is  not 


especially  important  for  this  paper.  The  important  point  for  the 


present  argument  is  that  at  least  four  broadly-tuned  visual  channels 


seem  to  exist  which  respond  to  information  at  different  spatial 


scales.  Whether  these  channels  reflect  size  sensitive  units. 


zero-crossing  filters,  or  spatial  frequency  filters  is  not  of 


concern  here 
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Much  of  the  psychophysical  evidence  for  the  existence  of 
spatial  channels  is  based  on  experiments  with  one-dimensional 
sinusoidal  grating  patterns.  In  this  type  of  pattern,  intensity 
varies  sinusoidally  in  one  dimension  with  this  variation  extended 
redundantly  across  the  second  dimension.  Surprisingly  few  studies 
have  used  two-dimensional  patterns  as  may  occur  in  realistic 
imagery.  Fortunately,  in  cases  where  complex  imagery  such  as  faces 
(Harmon  &  Julesz,  1973;  Fiorentini,  Maffei,  &  Sandini,  1983),  scenes 
(Caelli,  1983)  or  complex  textures  (Ginsburg,  1978;  Caelli,  1982) 
have  been  used,  the  results  have  been  consistent  with  the 
multichannel  model.  The  imagery  investigated  in  the  present 
experiments  depicted  simple  top-view  intensity  profiles  of  simulated 
ship  hulls  on  uncluttered  backgrounds. 

The  Role  of  Channels  in  Spatial  Vision 

As  the  evidence  for  the  existence  of  multiple,  scale-sensitive 
channels  has  accumulated,  increasing  numbers  of  investigators  have 
speculated  on  their  possible  role  in  spatial  vision.  Most 
discussions  of  this  issue  have  pointed  out  that  one  should  be 
cautious  in  assuming  that  the  channels  literally  perform  a  spatial 
Fourier  analysis  which  could  lead  to  a  reconstruction  from  the 
orthogonal  components.  The  small  number  of  channels  and  two-octave 
bandwidths  proposed  are  too  limiting  for  this  purpose  (Julesz, 
1980).  Rather,  most  speculation  on  the  role  of  these  channels  has 
involved  some  kind  of  underlying  attentional  mechanism. 
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In  an  early  proposal,  Broadbent  (1977)  identified  two  gross 
stages  in  human  visual  analysis,  an  early,  relatively  automatic 
preattentive  stage  and  a  subsequent  active,  attentive  analysis.  In 
his  view,  the  early  processing  is  based  on  global  information  and 
serves  to  segregate  "...detailed  stimuli  into  bundles  or  segments 
that  can  be  attended  to  or  rejected  as  a  whole"  (p.  112).  In 
contrast,  the  later  processing  is  based  on  the  detailed  information 
in  the  image.  He  speculated  further  that  the  visual  mechanism  which 
accomplishes  this  analysis  could  very  well  be  the  scale-sensitive 
channels  described  in  the  previous  section  of  this  report. 

A  more  complete  attentional  hypothesis  has  been  developed  by 
Julesz  (Julesz,  1980;  Julesz  &  Papathomas,  1984).  He  proposes  that 
the  spatial  channels  serve  as  a  kind  of  "perceptual  zoom  lens"  that 
permits  an  image  to  be  analyzed  at  any  of  a  number  of  levels.  For 
example,  "...a  low-frequency  channel  will  discard  fine  details  and 
thereby  emphasize  the  overall  layout  of  the  entire  picture.  A 
high-frequency  channel  brings  the  local  details  into  prominence  at 
the  expense  of  the  large-scale  regions  and  structure"  (1980,  p. 
309).  He  points  out  that  the  assumed  two-octave  bandwidths 
proprosed  for  the  filters  would  permit  three  "lenses"  at  low-  (.5-2 
cycles/degree  of  visual  angle),  mid-  (2-8  cycles/degree),  and  high- 
(8-32  cycles/degree)  spatial  scales.  Although  some  controversy 
exists  (e.g.,  Gellatly,  1983),  Julesz  and  Papathomas  (1984)  have 
recently  presented  some  demonstrations  which  support  what  they  term 
a  strong  version  of  this  attentional  hypothesis — that  the  spatial 
channels  function  in  attention  and  that  the  observer  can  exert 
control  over  the  specific  channel  that  will  be  dominant  at  any 
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instant. 

Related  to  this  are  some  recent  discussions  of  the  relation 
between  spatial  scale  and  the  traditional  Gestalt  distinction 
between  "figure"  and  "ground."  Julesz  and  his  colleagues  (Julesz, 
1978;  Julesz,  1980)  have  suggested  that  the  "figure"  portion  of  an 
image  receives  a  more  detailed  analysis  than  the  "ground"  portion. 
Presumably,  this  would  involve  high-  and  low-frequency  channels  for 
the  "figure"  and  "ground,"  respectively.  This  was  supported  in  a 
simple,  but  informative  visual  detection  experiment  by  Wong  & 
Weisstein  (1983).  Prior  to  presenting  a  stimulus,  observers  were 
asked  to  fixate  an  ambiguous  goblet/faces  image  and  to  indicate  when 
a  designated  portion  of  it  (e.g.,  goblet)  was  seen  as  figure.  In 
this  way  a  small  test  line  could  be  presented  in  either  a  figure  or 
a  ground  region  of  the  display.  Two  line  targets  of  different 
spatial  frequency  content  were  used,  a  sharply  defined  line  which 
had  a  relatively  broad  spectrum,  and  a  blurred  line  which  had  a 
markedly  peaked  spectrum  with  most  of  its  energy  at  lower 
frequencies.  In  other  words,  the  sharp  target  had  considerably 
greater  high  frequency  content  than  did  the  blurred  target.  The 
results  of  their  experiments  revealed  that  the  high  spatial 
frequency  target  was  more  accurately  detected  in  a  region  perceived 
as  "figure"  whereas  the  low-frequency  target  was  more  accurately 
detected  in  a  "ground"  region.  They  concluded  that  the  global 
character  and  the  rapid  response  time  (see  following  section) 
generally  attributed  to  low  spatial  frequency  channels  make  them 
well  suited  for  processing  image  ground  (Wong  &  Weisstein,  1983). 
This  suggests  that  the  subjective  state  of  attending  to  a  spatial 
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region — the  figure — selectively  activates  the  high  detail  channels. 
These  results  are  consistent  with  the  attentional  hypothesis 
outlined  earlier.  Unfortunately,  relatively  few  studies  have 
actually  examined  the  hypothesis  empirically  as  in  this  case. 

Global  Precedence  and  Low-Frequency  Dominance 

An  important  aspect  of  Broadbent's  discussion  of  the  role  of 
spatial  channels  in  attention  is  the  notion  that  the  global 
(low-frequency)  analysis  temporally  precedes  the  local 
(high-frequency)  analysis.  This  refers  to  a  recurring  theme  in 
recent  visual  information  processing  studies,  and  is  sometimes 
referred  to  as  the  global  precedence  effect  (Navon,  1977;  Ward, 
1982;  Hughes,  Layton,  Baird,  &  Lester,  1984).  As  implied  by  the 
title  of  Navon's  original  1977  article,  the  global  precedence  effect 
asserts  that  in  a  relative  sense  the  global  information  in  an 
image — "the  forest" — will  be  processed  before  the  local 

information — "the  trees."  Although  the  proposal  is  not  without 
controversy  (see  for  example,  Miller,  1981;  Ward,  1982),  most  agree 
that  global  dominance  is  often  observed. 

In  a  recent  study,  Hughes  and  his  colleagues  have  examined  the 
effect  under  a  variety  of  conditions.  They  present  the  argument 
that  local  and  global  processing  occurs  concurrently  and  that  the 
presence  of  global  cues  can  serve  to  retard  the  processing  of  local 
;  information.  They  also  speculate  that  global  precedence  may  result 

'  from  asymmetric  neural  inhibition  between  the  local  and  global 

^  spatial  frequency  channels  (Morrone,  Burr,  &  Maffei,  1982  cited  by 

Hughes  et  al ,  1984).  It  is  also  interesting  to  note  that  Wilson  & 
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Bergen  (1979)  attribute  different  temporal  response  characteristics 
to  their  four  size-sensitive  mechanisms.  These  findings  are 
reminiscent  of  earlier  work  which  revealed  two  types  of  temporal 
response  in  spatially-sensitive  retinal  cells.  As  Braddick, 
Campbell,  and  Atkinson  (1978)  summarize,  "The  X-  or  sustained  cells 
show  linear  spatial  summation,  small  receptive  fields  (and  hence  a 
good  response  to  high  spatial  frequencies  but  a  poor  response  to 
low),  and  a  sustained  temporal  response.  The  Y-  or  transient  cells 
are  spatially  nonlinear  and  respond  to  lower  spatial  frequencies 
than  X-cells  in  the  corresponding  retinal  region."  (p.  27). 
Although  the  implication  of  these  cells  in  the  attentional  processes 
that  Broadbent  (1977)  distinguished  would  be  very  speculative,  it  is 
of  interest  that  known  temporal  response  properties  of  spatial 
vision  channels  are  consistent  with  the  attentional  hypothesis. 

Experiment  1. 

The  purpose  of  this  experiment  is  to  investigate  the  ability  of 
human  observers  to  detect  and  recognize  simple  two-dimensional 
visual  objects  under  conditions  where  the  low-,  mid-,  or 
high-spatial  frequency  content  is  dominant.  The  objects  were  four 
simulated  top  views  of  ship  hulls  distinguished  by  the  presence  of 
one  or  two  deck  houses  and  by  the  presence  of  square  or  circular 
upper  deck  structures.  The  research  summarized  in  the  preceding 
discussion  suggests  that  spatial  frequency  should  be  of  major 
importance  in  determining  the  detection  and  recognition  performance 
achieved  with  the  spatially  filtered  images.  For  example,  since  the 
visual  cues  which  permit  the  four  ships  to  be  discriminated  involve 
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relatively  fine  detail  and  hence,  primarily  high  spatial  frequencies 
(see  method  for  a  more  complete  discussion),  recognition  performance 
should  be  best  when  high-frequencies  are  dominant  but  difficult  or 
impossible  when  this  information  is  reduced  as  in  the  low-  and 
mid-dominant  conditions.  On  the  other  hand,  the  low  frequency 
channels  should  play  a  primary  role  in  detection  and,  therefore,  the 
low-dominant  conditions  should  lead  to  optimal  detection 
performance.  By  a  parallel  argument,  the  mid-  and  high-dominant 
images  should  be  relatively  difficult  to  detect. 

To  summarize,  according  to  the  attentional  hypothesis  for  the 
ship  images  employed  here,  the  low-frequency  dominant  imagery  should 
lead  to  poor  recognition  performance,  but  to  very  good  detection 
performance.  In  contrast,  the  high-frequency  dominant  images  should 
lead  to  good  recognition  performance,  but  relatively  poor  detection 
performance . 

Method 

Observers .  Six  paid  undergraduate  volunteers  served  as 
observers  in  the  experiment.  Two  served  in  both  the  detection  and 
recognition  tasks,  two  in  only  the  detection  task,  and  two  in  only 
the  recognition  task.  All  of  the  participants  had  normal  or 
corrected-to-normal  vision. 


Apparatus .  Image  preparation,  control  of  experimental  events, 
and  data  analyses  were  carried  out  on  a  general  purpose  laboratory 
computer  (Digital  PDP-11/23).  This  computer  served  as  a  controlling 
host  for  a  Gould  Imaging  and  Graphics  IP8400  image  processing  system 
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which  was  used  for  on-line  image  processing,  storage,  and 
presentation.  Participants  were  seated  in  a  darkened  room  and 
viewed  the  test  imagery  on  a  high-resolution,  9  in  (22.9  cm) 
diagonal,  monochrome  monitor  (Cohu  Model  DM  9/C)  at  a  viewing 
distance  of  122  cm.  The  image  was  displayed  with  a  resolution  of 
256  by  256  8-bit  pixels  in  one  quadrant  of  a  512  by  512  pixel 
display.  Participants  entered  their  responses  on  a  standard 
terminal  keyboard,  and  verbal  feedback  was  displayed  on  the  monitor 
by  means  of  the  IP8400  alphanumeric  generator. 

Imagery.  Preparation  of  the  test  imagery  involved  several 
steps.  Initially,  top-view  images  of  the  four  ships  were  created  by 
varying  a  two  dimensional  intensity  profile  as  shown  in  Figure  1. 
Ships  A  and  C  are  characterized  by  a  split  deck  house  with  square 
and  circular  upper  deck  structures,  respectively.  Ships  B  and  D 
have  a  single  deck  house  with  square  and  circular  upper  deck 
structures,  respectively.  It  is  clear  from  Figure  1  that  the 
differences  among  the  four  ships  are  based  on  a  small  number  of 
pixels  and  hence  on  relatively  high  spatial  frequencies.  The  gap 
distinguishing  the  split  and  full  deck  house  is  six  pixels  (.088 
degree  of  visual  angle  at  the  4  ft  viewing  distance),  and  the 
difference  between  the  circular  and  square  deck  structures  is  three 
pixels  on  the  diagonal  (.044  degree). 

Once  the  images  were  constructed,  three  transformed  versions  of 
each  ship  were  created  to  emphasize  low-,  mid-,  and  high-spatial 
frequencies.  Each  ship  image  was  Fourier  transformed  using  an  FFT 
alogrithm  (see  Gonzales  &  Wintz,  1977).  The  frequency  domain 
representation  of  each  ship  was  then  multiplied  by  circular  low-pass 


Figure  1.  Unfiltered  top-view  images  of  simulated  ship  hulls.  Ship 
A  appears  in  the  upper  left,  ship  B  in  the  upper  right,  ship  C  in 
the  lower  left,  and  ship  D  in  the  lower  right. 
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and  band-pass  "pill-box"  filters  with  two-octave  bandwidths.  The 
low-pass  filter  was  centered  at  1  cycle/degree  (0-8  pixels),  the 
mid-pass  filter  at  5  cycles/degree  (9-32  pixels),  and  the  high-pass 
filter  (actually  a  band-pass  filter)  at  21  cycles/degree  (33-128 
pixels).  The  resulting  data  were  then  inverse  transformed  back  into 
the  image  domain  for  presentation.  Although  two-octave  filters  were 
used,  the  resulting  displayed  imagery  had  somewhat  broader 
bandwidths  because  of  the  mapping  used  to  display  the  transformed 
images.  The  resulting  images  had  dominant  information  in  the  low, 
mid,  and  high  spatial  frequency  regions.  These  images  are  shown  in 
Figure  2. 

Finally,  the  transformed  and  original  ship  images  were  adjusted 
to  have  equivalent  mean  luminance  when  displayed  on  the  calibrated 
monitor  as  measured  by  a  Photo  Research  Model  502  spot  photometer. 
Pilot  experimentation  was  carried  out  to  establish  presentation 
durations  and  intensities  which  would  yield  acceptable  performance 
levels  in  the  detection  and  recognition  tasks,  that  is,  with  neither 
floor  nor  ceiling  effects  in  either  task.  For  recognition,  a 
display  time  of  approximately  132  ms  was  used  with  a  mean  display 
luminance  of  approximately  15.52  cd/ra2  .  For  detection,  the  images 
were  presented  for  a  single  frame  time  of  approximately  33  ms  and 
observers  viewed  the  monitor  through  neutral  density  filters  to 
achieve  an  overall  reduction  in  display  luminance  of  4.3  log  units 
from  the  recognition  level. 


Low-pass  Condition 


Mid-pass  Condition 


High-pass  Condition 


Figure  2.  Spatial  frequency  filtered  images  of  simulated  ship 
hulls.  Within  each  filter  condition  ship  A  appears  in  the  upper 
left,  ship  B  in  the  upper  right,  ship  C  in  the  lower  left,  and  ship 
D  in  the  lower  right. 
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Procedure .  Prior  to  beginning  the  experiment,  observers  read 
instructions  which  explained  the  task.  For  detection  they  were  told 
that  a  ship  would  occur  on  every  trial  and  that  only  the  quadrant  in 
which  it  appeared  was  important — its  identity  could  be  ignored. 
Conversely,  for  recognition  they  were  told  to  ignore  the  quadrant  of 
presentation  and  to  identify  the  ship.  In  the  latter  case,  a  sketch 
of  the  four  ship  types  was  provided.  Testing  took  place  in  a 
darkened  room  and  a  ten-minute  dark  adaptation  period  preceded  the 
detection  sessions.  Individual  trials  were  similar  for  the 
detection  and  recognition  sessions  and  began  with  a  500  ms 
presentation  of  a  cross-hair  fixation  which  divided  the  display  into 
quadrants.  Following  this  the  cross-hair  was  replaced  by  one  of  the 
4  ship  images  selected  randomly.  This  remained  visible  for  33  ms 
for  the  detection  trials  (1  video  frame  time)  or  132  ms  for  the 
recognition  trials  (4  frame  times).  Observers  entered  their 
response  on  a  standard  keyboard.  For  recognition  trials,  verbal 
feedback  regarding  the  correct  response  was  displayed  on  the  monitor 
for  2  s.  No  feedback  was  provided  on  the  detection  trials.  The 
duration  of  the  inter-trial-interval  varied  depending  on  the  time 
required  to  obtain  the  next  image  from  a  disk  file,  but  was 
approximately  1.5  s.  Observers  completed  384  trials  per  session  (6 
occurrences  of  the  4  ships  by  4  filter  conditions  by  4  quadrants) 
for  5  sessions  totalling  1920  trials  per  individual. 

Results  and  Discussion 
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Overall  recognition  performance .  A  mean  percentage  correct  was 
calculated  for  each  condition  for  each  of  the  four  observers  across 
the  five  experimental  test  sessions.  These  overall  means  are 
presented  in  Figure  3  for  the  three  filtered  and  the  unfiltered 
images.  These  data  were  submitted  to  a  three-way  (filter  condition 
by  ship  by  day)  repeated  measures  ANOVA.  Several  findings  were  of 
interest.  First,  as  expected,  a  significant  main  effect  of  filter 
was  obtained,  F( 3,9)»26.56,  £<.001,  with  no  significant  interactions 
between  filter  and  any  of  the  other  variables.  A  post-hoc  analysis 
of  these  differences  with  Duncan's  New  Multiple  Range  Test  revealed 
that  performance  was  significantly  better  for  the  unfiltered  images 
(69%)  than  for  any  other  condition,  that  the  high-pass  imagery  was 
recognized  more  reliably  (45%)  than  the  mid-  and  low-pass  cases  (31% 
and  32%,  respectively),  and  that  the  mid-  and  low-pass  cases  were 
not  reliably  different  from  each  other. 

Second,  no  main  effect  of  ship  occurred  (43%,  45%,  43%,  and  45% 
for  ships  A,  B,  C,  and  D,  respectively),  F(3, 9X1.0,  and  no 
significant  interactions  were  obtained  between  ship  and  any  other 
factor.  This  result  indicates  that  no  single  ship  had  unique  or 
idiosyncratic  properties  which  may  otherwise  limit  interpretation  of 
the  filter  effect. 

Third,  a  reliable  main  effect  of  day  was  observed, 
F( 4 , 12 )-6 . 87 ,  £<.01,  with  overall  performance  increasing  across  the 
first  four  days  and  leveling  off  by  the  fifth  day  (37%,  40%,  45%, 
49%,  and  50%  for  the  five  days,  respectively).  Although  not 
specifically  predicted,  a  practice  effect  of  this  type  is  not 
unexpected.  The  further  finding  that  no  reliable  interactions 
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occurred  between  day  and  any  other  factor  indicates  that  practice 
simply  led  to  improved  performance,  regardless  of  the  viewing 
condition. 

Response  bias  in  recognition.  The  above  findings  are 
consistent  with  our  predictions.  It  would  be  difficult  to  account 
for  the  observed  pattern  of  results  by  response  bias  alone. 
Nevertheless,  overall  performance  level  may  reflect  response  bias 
tendencies  as  well  as  actual  observer  sensitivity  to  the  image 
attributes . 

A  prelimary  analysis  was  carried  out  to  examine  the  recognition 
data  for  evidence  of  response  bias.  This  analysis  involved 
compiling  the  frequency  of  each  recognition  response  for  each  filter 
condition  and  observer.  Any  tendency  to  favor  a  particular  response 
regardless  of  the  actual  ship  that  was  presented  would  indicate  the 
presence  of  response  bias.  These  frequencies  were  analyzed  by  a 
two-way,  repeated  measures  ANOVA  (filter  condition  by  ship).  No 
significant  main  effects  or  interactions  were  obtained.  Hence, 
there  was  no  systematic  bias.  Despite  this,  a  detailed  examination 
of  individual  data  did  suggest  a  slight  response  bias  for  one 
observer.  Specifically,  this  individual  displayed  a  tendency  to 
indicate  ship  A  or  ship  B  (both  with  square  deck  structures) 
whenever  an  unfiltered  or  high-frequency  image  occurred  (61%  vs. 
39%)  and  to  indicate  ship  C  or  ship  D  (circular  deck  structures) 
whenever  a  mid-  or  low-frequency  image  occurred  (65%  vs.  35%). 
This  suggests  that  the  presence  of  high  spatial  frequencies  led  this 
individual  to  "see"  a  ship  with  sharp  features  (the  square  deck 
structures)  rather  than  one  with  smooth  features.  Nevertheless, 
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this  tendency  was  not  sufficiently  strong  to  play  a  major  role  in 


the  overall  data. 


Analysis  of  recognition  confusions .  The  previous  analyses  have 


shown  that  observers  recognize  the  unfiltered  and  high-frequency 


ships  more  reliably  than  they  do  the  mid-  and  low-frequency  ships, 


and  that  this  result  cannot  be  attributed  to  response  bias. 


Additional  analyses  were  carried  out  on  the  types  of  confusions 


which  actually  occurred  to  obtain  more  information  about  what 


aspects  of  the  imagery  made  the  mid-  and  low-conditions  difficult. 


Two  by  two  confusion  matrices  were  derived  for  each  individual  and 


filter  condition,  one  for  the  split/full  deck  house  attribute  and 


the  other  for  the  square/circular  deck  structures  attribute.  A 


response-bias  free  index  of  performance,  d'  (see  Green  &  Swets, 
1966),  was  then  determined  for  each  matrix  by  defining  a  hit  as  a 


split-deck  category  response  (ships  A  or  C)  given  that  a  split-deck 


occurred  and  a  false  alarm  as  a  split-deck  response  when  a  full-deck 


actually  occurred  (ships  B  or  D).  Analogous  definitions  were  used 


for  the  square/circular  deck  structures  matrix. 


A  mean  d'  discrimination  index  was  then  determined  for  each 


filter  and  attribute  by  averaging  across  individuals.  These  means 


are  shown  in  Figure  4.  A  two-way  (filter  by  attribute),  repeated 


measures  ANOVA  revealed  reliable  main  effects  of  both  filter, 


F( 3 , 9 )-27 . 24 ,  £<.001,  and  attribute,  F( 1 , 3  )-27 . 24 ,  £<.001,  as  well 


as  a  significant  filter  by  attribute  interaction,  F(3,9)-5.90, 


£<•025.  As  is  evident  in  Figure  4,  the  filter  effect  obtained  with 


these  bias-free  means  mirrors  that  reported  for  the  overall 


performance  analysis  (mean  d's:  unf iltered-2 . 18 ,  high-1.04, 
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low*. 27,  and  mid*. 28).  The  main  effect  of  attribute  reflects  t. 
large  performance  advantage  for  the  split/full  deck  discrimination 
(mean  d'=1.39)  over  the  circular/square  structure  discrimination 
(mean  d'=.50).  Furthermore,  as  seen  in  Figure  4,  the  reliable 
interaction  indicates  that  this  advantage  occurred  primarily  for  the 
unfiltered  and  high  frequency  images. 

Recognition  latency  analysis .  A  mean  response  latency  was 
determined  for  each  condition  and  individual  in  the  experiment.  The 
results  are  shown  in  Figure  5.  These  data  were  analyzed  by  a 
three-way  (filter  condition  by  ship  by  day),  repeated  measures 
ANOVA.  The  analysis  revealed  a  significant  main  effect  of  filter, 
F(3,9)=5.82,  p<.025;  no  other  main  effects  or  interactions  were 
significant  at  the  .05  level.  Inspection  of  Figure  5  suggests  that 
the  main  effect  of  filter  condition  reflects  a  partitioning  of  the 
latencies  into  two  sets,  relatively  fast  for  the  low-frequency 
images  (1394  ms),  and  relatively  slow  for  the  unfiltered,  high-  and 
mid-frequency  images  (overall  mean  of  2031  ms).  A  follow-up  post 
hoc  analysis  with  Duncan's  New  Multiple  Range  test  confirmed  this 
observation  with  reliable  differences  occurring  across  the  slow  and 
fast  groups,  but  no  reliable  differences  occurring  within  the  slow 
conditions.  Although  individuals  differed  dramatically  in  their 
average  response  time  (from  1442  ms  to  2163  ms),  each  showed  this 
pattern.  These  findings,  coupled  with  the  accuracy  data,  suggest 
that  observers  might  have  regarded  the  low-frequency  images  as  a 
"lost  cause"  and  responded  relatively  quickly  whenever  they 
occurred.  On  the  other  hand,  a  simple  speed/accuracy  tradeoff 
cannot  account  for  the  overall  pattern  of  latencies  because 
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observers  took  significantly  longer  to  respond  to  the  mid-  than  to 
the  low-frequency  images  even  though  these  two  conditions  did  not 
differ  in  accuracy. 

Summary  of  recognition  results .  Overall,  the  recognition 
findings  were  consistent  with  the  predictions  developed  in  the 
introduction.  The  low-  and  mid-frequency  images  were  nearly 
impossible  to  recognize  reliably  under  the  conditions  presented 
here,  whereas  the  high-frequency  and  unfiltered  images  led  to 
reasonably  accurate  recognition  levels.  Furthermore,  a  follow-on 
analysis  of  attribute  confusions  indicated  that  neither  attribute 
could  be  discriminated  in  the  low-  and  mid-frequency  images,  the 
higher  frequency  deck  structures  attribute  was  reasonably  well 
discriminated  for  only  the  unfiltered  images  (mean  d'«1.88),  and  the 
lower  frequency  deck  house  attribute  could  be  discriminated  in  both 
the  unfiltered  (d'-2.86)  and  the  high-frequency  imagery  (d'-1.82). 
Finally,  the  pattern  of  response  latencies  was  consistent  with  the 


accuracy  analyses  in  suggesting  that  observers  regarded  the 
low-frequency  images  as  very  difficult  or  impossible  to  recognize. 

Overall  detection  performance .  The  mean  percentage  correct 


detection  was  determined  for  each  condition  and  each  observer  in  the 
experiment.  Since  a  four-alternative  forced-choice  detection 
procedure  was  used,  unlike  the  overall  recognition  data,  these  data 
provide  a  bias-free  index  of  detection  performance  (Green  &  Swets, 
1966).  These  means  are  plotted  by  day  in  Figure  6  for  each  of  the 
four  filter  conditions.  A  three-way  (filter  condition  by  ship  by 
day)  repeated  measures  ANOVA  was  carried  out  on  these  data.  This 


analysis  revealed  significant  main  effects  of  filter,  F( 3 , 9 )-70 . 51 , 
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£<.001,  and  ship,  F(3,9)«4.45,  £<.05,  as  well  as  significant  filter 
by  ship,  F( 9 , 27 )-5 . 66 ,  £<.001,  and  filter  by  day,  F( 12 , 36 )-3 . 99 , 
£<.001,  interactions.  The  reliable  main  effect  of  ship  reflects  a 
small  performance  difference  across  the  four  ships  (66%,  69%,  69%, 
and  66%  for  ships  A,  B,  C,  and  D,  respectively).  This  difference 
will  not  be  considered  further. 

As  seen  in  Figure  6,  the  main  effect  of  filter  condition 
reflects  the  predicted  detection  advantage  for  the  low-frequency 
images,  with  performance  on  this  condition  exceeding  even  that  for 
the  unfiltered  condition  (low»83%,  unf iltered-76% ,  high-56%,  and 
mid-54%).  However,  these  findings  must  be  interpreted  within  the 
context  of  the  two  reliable  interactions.  Consider  first  the  filter 
by  day  interaction  depicted  in  Figure  6.  It  is  obvious  by  visual 
inspection  that  the  four  filters  led  to  a  consistent  pattern  of 
performance  for  all  except  the  first  day  when  a  reversal  of  the 
high-  and  mid-frequency  conditions  occurred.  Since  this  effect  is 
small  and  theoretically-uninteresting,  the  interaction  will  not  be 
considered  further. 

The  more  important  interaction  occurred  between  filter  and 
ship.  Does  this  suggest  that  the  four  ships  led  to  meaningfully 
different  detection  performance  for  the  different  filter  conditions? 
The  relevant  means  are  shown  in  Table  1.  A  simple  effects  analysis 
revealed  a  highly  significant  main  effect  of  filter  for  each  of  the 
four  ships,  indicating  that  a  filter  effect  did  occur  for  each  of 
the  four  ships  as  suggested  by  Table  1.  Furthermore,  post  hoc 
comparisons  were  carried  out  on  each  simple  effects  analysis  with 
Duncan's  Test.  This  revealed  that  both  the  low-frequency  and 


Spatial  scale 


Page  27 


unfiltered  images  were  detected  reliably  better  than  the  mid-  and 
high-frequency  images  for  all  ships,  but  that  the  low-frequency 
condition  was  detected  reliably  better  than  the  unfiltered  images 
only  for  ship  C. 

In  summary,  detection  was  better  for  images  containing  low 
spatial  frequencies  (80%  overall  for  the  low-frequency  and 
unfiltered  images)  than  for  images  containing  only  the  higher 
spatial  frequencies  (55%  overall  for  the  mid-  and  high-frequency 
images).  In  addition,  the  fact  that  there  was  a  consistent 
tendency,  statistically  reliable  for  ship  C,  for  the  low-frequency 
images  to  produce  better  detection  than  the  broad-band,  unfiltered 
images,  suggests  that  the  presence  of  high  spatial  frequencies  in 
the  images  might  have  interfered  with  the  observers'  ability  to 
detect  the  ships. 

Detection  latency  analysis .  A  mean  detection  response  latency 
was  determined  for  each  individual  and  condition.  A  three-way 
(filter  condition  by  ship  by  day)  repeated  measures  ANOVA  revealed  a 
significant  main  effect  of  filter,  F(3,9)-5.07,  £-.025,  and  a 
significant  filter  by  ship  interaction,  F( 9 , 27 )-2 . 71 ,  £-.025.  No 
other  effects  were  significant  at  the  .05  level.  As  in  the  case  of 
the  recognition  latencies,  inspection  of  the  overall  means  for  each 
filter  condition  reveals  a  partitioning  into  fast  and  slow 
responses.  However,  in  this  case  the  two  conditions  that  led  to 


accurate 

detection  also 

led 

to 

fast 

responses  (934 

ms  on 

the 

average ) 

whereas  those 

that 

led 

to 

poor  detection 

(mid- 

and 

high-frequency)  showed  slow  responding  (1019  ms  on  the  average). 
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Before  considering  these  findings  further,  the  filter  by  ship 
interaction  must  be  considered.  The  relevant  means  as  well  as  the 
F's  resulting  from  a  simple  effects  analysis  on  each  ship  are  shown 
in  Table  2.  As  may  be  seen,  a  reliable  simple  effect  of  filter 
occurred  for  each  of  the  four  ships  and  a  similar  "fast/slow" 
pattern  of  latencies  occurred  for  all  but  ship  C.  Post  hoc 
follow-on  analyses  with  Duncan's  Test  revealed  (a)  that  the 
low-frequency  and  unfiltered  image  latencies  did  not  differ  for  any 
ship,  (b)  that  the  low-frequency  images  were  detected  significantly 
faster  than  the  mid-  and  high-frequency  images  for  all  ships,  and 
(c)  that  the  unfiltered  images  were  significantly  faster  than  the 
mid-  and  high-frequency  images  for  only  ships  A  and  B.  These 
latency  results  are  consistent  with  the  detection  accuracy  data  in 
distinguishing  the  images  with  low-frequency  content  (the  unfiltered 
and  low-frequency  images)  from  those  with  relatively  little 
low-frequency  information  (the  mid-  and  high-frequency  images). 
Although  highly  speculative  in  the  context  of  this  experiment,  it  is 
interesting  that  the  relatively  faster  response  times  observed  for 
the  low-frequency  images  is  consistent  with  the  known  temporal 
characteristics  of  the  low-spatial  frequency  channels  reviewed  in 
the  introduction. 


Experiment  2 


The  results  of  Experiment  1  were  consistent  with  the 
attentional  hypothesis  on  the  role  of  spatial  scale  in  visual 
perception.  Different  ranges  of  spatial  frequencies  led  to  optimal 
performance  for  the  detection  and  recognition  tasks.  An  additional 
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Table  2 

Mean  detection  response  latency  (ms)  by  ship  and  filter  condition. 


Filter  Condition 


Low 

Mid 

High 

Unf ilt 

Mean 

A 

944 

1027 

1025 

923 

980 

B 

913 

1012 

1030 

937 

973 

C 

916 

1036 

982 

961 

974 

D 

926 

1027 

1014 

958 

982 

Mean 

924 

1025 

1013 

945 
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question  raised  by  Julesz  and  Papathomas  (1984)  concerns  the  ability 
of  observers  to  regulate  the  attended  spatial  frequency  channels  on 
a  voluntary  basis.  Their  demonstration  supported  what  they  referred 
to  as  a  "strong"  form  of  the  hypothesis  in  revealing  that  some 
voluntary  control  does  exist.  This  leads  to  the  further  question  of 
whether,  if  given  a  choice,  observers  would  voluntarily  select 
imagery  that  had  been  spatially  filtered  to  include  an  optimal 
frequency  band.  This  question  is  investigated  in  the  second 
experiment.  In  particular,  the  detection  and  recognition  tasks  of 
Experiment  1  were  replicated,  but  in  Experiment  2,  observers  were 
given  control  over  the  filter  condition  viewed  on  each  trial. 
Immediately  prior  to  image  presentation  the  observer  selected  which 
of  the  four  filter  conditions  to  present  (low-,  mid-,  or 
high-frequency  dominant,  or  unfiltered).  f  the  observers  are 
sensitive  to  the  role  of  spatial  frequency  filtering  on  detection 
and  recognition  performance  then  performance  should  be  optimized  by 
the  selection  of  low-frequency  images  for  the  detection  task  and 
unfiltered  images  for  the  recognition  task.  This  finding  would 
suggest  that  individuals  have  a  reliable  "meta-perception"  or 
intuition  regarding  what  will  contribute  to  good  performance  in  a 
simple  perceptual  task  (Nisbett  &  Wilson,  1977;  Ericsson  &  Simon, 
1980)  . 

Method 

Observers .  Eight  undergraduate  volunteers  served  in  the 
Experiment,  four  in  the  recognition  task  and  four  in  the  detection 
task.  All  reported  normal  or  cor rected-to-normal  vision,  and  none 
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participated  in  Experiment  1. 

Apparatus .  The  apparatus  was  identical  to  that  used  in 
Experiment  1. 

Imagery.  The  imagery  was  identical  to  that  used  in  Experiment 

1. 

Procedure .  The  procedure  of  Experiment  1  was  used,  but  prior 
to  beginning  each  trial  the  observer  pressed  a  key  to  select  which 
of  the  four  filter  conditions  to  observe.  As  a  result,  the 
frequency  of  occurrence  of  each  filter  condition  was  an  additional 
dependent  variable  in  this  experiment.  No  specific  instructions 
were  given  regarding  which  filter  condition  to  select,  observers 
were  told  simply  to  select  the  imagery  which  would  make  their  task 
easiest.  As  in  Experiment  1,  feedback  was  provided  following  the 
recognition  responses,  and  no  feedback  was  given  during  detection. 

Results  and  Discussion 

Filter  selection  for  the  recognition  task .  The  mean  frequency 
of  selection  was  determined  for  each  filter  condition  and  observer 
in  the  experiment.  The  results  of  this  analysis  are  shown  in  Table 
3.  As  is  evident  from  the  table,  two  of  the  four  observers  showed  a 
decided  preference  for  the  unfiltered  images,  selecting  these  images 
on  97%  and  66%  of  the  trials,  whereas  the  remaining  two  observers 
preferred  the  high-pass  images  with  selection  on  58%  and  98%  of  the 
trials.  The  selection  of  the  high-pass  imagery  by  the  latter  two 
individuals  is  curious  given  the  finding  of  Experiment  1  that 
high-pass  filtered  imagery  led  to  poorer  recognition  performance 
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Table  3 


Mean  relative  frequency  of  filter  selection  and  percentage  correct 
recognition  (shown  in  p-.rentheses )  for  each  of  four  observers. 


Filter  Condition 


Observer 


High 


Unfilt 


.58  (46) 


.98  (60) 


.97  (82) 


.66  (88) 
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than  did  unfiltered  imagery.  Recognition  performance  is  examined 
next . 

Recognition  performance .  The  mean  percentage  correct  was 
determined  for  each  preferred  viewing  condition  for  each  of  the  four 
observers,  collapsed  across  the  five  test  sessions.  These  results 
appear  in  parentheses  in  Table  3.  The  two  observers  who  showed  a 
preference  for  the  high-frequency  imagery  performed  substantially 
poorer  (53%  correct)  than  the  two  with  a  preference  for  the 
unfiltered  imagery  (85%  correct).  This  is  consistent  with  the 
pattern  of  Experiment  1  which  revealed  better  recognition 
performance  for  the  unfiltered  (69%)  than  for  the  high-frequency 
images  (45%).  Nevertheless,  the  overall  performance  levels  achieved 
in  this  experiment  were  higher  than  those  observed  in  the  first 
experiment . 

As  in  Experiment  1,  an  additional  response-bias  free  analysis 
was  carried  out  on  the  two  by  two  confusion  matrices  for  the 
deck-house  and  deck-structures  attributes.  The  results  of  this 
analysis  were  consistent  with  those  of  Experiment  1  in  revealing 
superior  overall  performance  for  the  deck-house  attribute  (mean 
d'  -  2.60)  than  for  the  deck-structures  attribute  (mean  d'  -  1.72). 
Furthermore,  this  analysis  also  supported  the  asymmetry  between  the 
two  categories  of  observers  identified  above.  Individuals  who 
selected  the  high-frequency  imagery  did  substantially  worse  on  both 
attributes  than  those  who  selected  the  unfiltered  imagery 
(deck-structures:  mean  d'  -  .73  vs.  2.72;  deck-house:  mean 
d'  -  1.86  vs.  3.34).  As  in  the  case  of  the  overall  performance 
data,  these  analyses  also  show  better  performance  for  both 
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attributes  in  this  experiment  (mean  d'  =  1.72  and  2.60)  than  in  the 


first  experiment  (mean  d'  ■  .50  and  1.39). 


The  key  to  understanding  the  overall  differences  between  the 


results  of  this  experiment  and  those  of  Experiment  1  may  lie  in  the 


overall  practice  obtained  under  each  viewing  condition.  Since  each 


observer  in  Experiment  2  tended  to  select  only  one  of  the  four  image 


types  for  presentation,  the  selected  type  (either  high-frequency  or 


unfiltered)  occurred  far  more  frequently  than  in  Experiment  1.  In 


particular,  individuals  who  selected  the  unfiltered  imagery  averaged 


1579  presentations,  whereas  those  who  selected  the  high-frequency 


imagery  averaged  1490  presentations.  In  both  cases,  the  preferred 


filter  condition  appeared  more  than  three  times  as  often  in  this 


experiment  than  in  Experiment  1.  This  suggests  that  additional 


practice  with  the  selected  imagery  led  to  better  overall  recognition 


performance  than  found  in  Experiment  1. 


Filter  selection  for  the  detection  task.  As  in  the  case  of 


recognition,  the  relative  frequency  of  filter  selection  was 


determined  for  each  filter  condition  and  observer.  These  results 


are  shown  in  Table  4.  As  seen  in  the  table,  each  of  the  four 


observers  had  a  clear  preference  for  the  low-pass  imagery  with 
selection  for  more  than  90%  of  the  trials.  This  finding  is 


consistent  with  the  results  of  Experiment  1  which  demonstrated 


optimal  detection  performance  for  this  filter  condition.  The  result 


also  stands  in  sharp  contrast  to  the  selection  results  for  the 


recognition  condition  in  which  two  of  the  four  observers  had  a 


selection  preference  for  a  non-optimal  filter. 
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Detection  performance .  The  mean  percentage  correct  was 
determined  for  the  preferred  low-pass  imagery  for  each  of  the  four 
observers,  collapsed  across  the  five  test  sessions.  These  results 
are  shown  in  parentheses  in  Table  4.  With  the  exception  of  observer 
3  who  found  the  task  extremely  difficult,  overall  detection 
performance  (74%  with  and  83%  without  Observer  3)  was  comparable  to 
that  obtained  in  Experiment  1  (83%  overall).  This  comparability 

occurred  despite  the  fact  that  observers  in  the  present  experiment 
received  substantially  more  practice  with  the  low-frequency  imagery 
(mean  number  of  trials  =  1739)  than  did  the  observers  in  Experiment 
1  (480  trials).  This  suggests  that  unlike  recognition,  detection 
levels  are  nearly  optimal  and  further  improvements  would  not  be 
expected  with  more  practice. 


General  Discussion 


Overall,  the  results  of  this  study  are  consistent  with  the 
attentional  hypothesis  on  the  role  of  spatial  scale  in  image 
perception.  In  Experiment  1,  different  ranges  of  spatial 
frequencies  led  to  optimal  performance  for  the  detection  and 
recognition  tasks.  Experiment  2  showed  further  that,  when  given  a 
choice,  observers  may  not  always  select  imagery  which  contains 
spatial  frequencies  which  lead  to  optimal  recognition  performance. 
Several  aspects  of  these  results  are  discussed  further  below. 


As  noted  above,  the  recognition  findings  were  consistent  with 
the  predictions  developed  in  the  introduction.  In  the  first 


experiment  the  unfiltered  and  high-frequency  images  were  recognized 
substantially  better  than  the  other  images.  A  follow-on  analysis 
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indicated  further  that  the  higher  frequency  deck-structure 
(circular/square)  was  nearly  impossible  to  discriminate  in  all  but 
the  unfiltered  images,  whereas  the  lower-frequency  deck-house 
(split/full)  discrimination  was  comparatively  easy  for  both  the 
high-frequency  and  unfiltered  images.  To  understand  this  result,  it 
is  necessary  to  consider  the  spatial-frequencies  involved  in  the  two 
discriminations,  in  a  simplified  first  analysis  it  can  be  argued 
that  the  six-pixel  discrimination  required  to  distinguish  the  split 
from  full  deck  house  would  have  a  fundamental  spatial  frequency  of 
21,33  c/i  (cycles/image)  or  5.63  c/d  (cycles/degree  of  visual  angle) 
(128/6*21.33  c/i),  whereas  the  three-pixel  difference  between  the 
circular  and  square  deck  structures  has  a  fundamental  of  42.67  c/i 
or  11.26  c/d  (cf.  Ginsburg,  1978;  p.  44).  From  this  perspective, 
the  observation  that  low-  and  mid-frequency  images  led  to  poor 
discrimination  is  not  surprising — the  information  was  simply  not 
provided  within  the  passbands  of  these  filters  to  permit 
discrimination  (.26  c/d  -  2.11  c/d  for  low,  2.38  c/d  -  8.44  c/d  for 
mid) . 


However,  this  simplified  analysis  falls  short  of  telling  the 
whole  story.  For  even  relatively  simple  shapes  such  as  those  used 
to  construct  the  ship  images  investigated  here,  differences  between 
objects  in  the  spatial  frequency  domain  are  far  more  subtle  and 
complex  than  is  suggested  in  the  above  analysis.  For  example, 
Figure  7  displays  the  two-dimensional  Fourier  transform  of  the 
difference  between  a  circular  and  square  shape  as  used  in  the 
deck-structures  attribute.  Examination  of  this  figure  makes  it 


clear  that  the  frequency-domain  differences  between  these  two  simple 


Figure  7.  Log  display  of  two-dimensional  spectral  magnitude  data 
for  the  difference  between  a  circle  and  square  (as  in  the 
deck-structure  attribute).  The  display  is  centered  at  the  constant 
or  d.c.  point  with  spatial  frequency  increasing  outward  from  this 
point . 
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shapes  are  broadly  distributed  across  the  spectrum.  In  terms  of  the 
filters  used  in  this  investigation,  information  sufficient  for 
discrimination  exists  in  all  three  frequency  bands.  The  real  issue, 
then,  concerns  the  ability  of  the  human  observer  to  make  use  of  this 
information. 

In  the  first  experiment  observers  were  not  able  to  distinguish 
the  more  subtle  deck-structures  attribute  on  the  basis  of  a  subset 
of  spatial  frequencies  regardless  of  where  these  frequencies  fell 
(i.e.,  low-,  mid-,  or  high-band).  Only  the  unfiltered  images 
produced  reliable  discrimination  for  this  attribute.  This  suggests 
that  the  overall  spectral  configuration  or  pattern  of  spatial 
frequencies  is  of  primary  importance  for  discrimination  and  hence 
recognition . 

The  results  of  the  second  experiment  suggest  further  that 
substantial  improvements  in  recognition  performance  can  occur  with 
additional  practice.  This  finding  must  be  interpreted  cautiously, 
however,  since  the  image  selection  paradigm  investigated  in  the 
second  experiment  may  have  produced  quite  a  different  perceptual 
task  than  that  investigated  in  the  more  conventional  paradigm  used 
in  the  first  experiment.  In  particular,  the  observers  in  Experiment 
2  were  effectively  classifying  imagery  falling  within  a  single 
spatial  frequency  band.  In  contrast,  observers  in  Experiment  1 
received  imagery  from  four  spatial  frequency  bands  (low,  mid,  high 
and  unfiltered).  This  distinction  may  have  permitted  the  former 
individuals  to  treat  each  of  the  four  images  (4  ships  x  1  filter 
condition)  as  a  unique  entity  to  be  categorized  in  a 


paired-associate  fashion,  whereas  the  observers  in  Experiment  1  had 
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the  more  difficult  task  of  either  learning  categories  for  16  unique 


images  (4  ships  x  4  filter  conditions)  or  of  determining  general 


features  or  characteristics  for  the  four  ships  which  would  apply 


across  the  various  filter  conditions.  Additional  experimentation  is 


required  to  understand  more  fully  the  performance  differences 


obtained  between  Experiments  1  and  2. 


Detection.  The  detection  results  obtained  in  the  present  study 


are  also  consistent  with  the  hypotheses  developed  in  the 


introduction , 


For  the  luminance  and  exposure  conditions 


investigated  here,  the  low  spatial  frequency  imagery  led  to 


unambiguously  better  detection  performance  than  did  imagery  from  the 


three  other  spatial  frequency  bands.  Furthermore,  although 
observers  in  the  second  experiment  received  considerably  more 
practice  with  the  low-frequency  imagery  than  did  observers  in  the 


first  experiment,  no  overall  detection  performance  difference 


occurred  between  the  two  experiments.  This  suggests  that  detection 


performance  was  nearly  optimal  with  this  imagery  in  both  cases. 


Despite  this,  however,  an  alternative  to  the  attentional  hypothesis 


can  be  proposed  to  account  for  the  detection  results.  Specifically, 
the  1  cycle/degree  center  frequency  of  the  low-pass  spatial 


frequency  filters  investigated  here  would  be  expected  to  yield 


maximum  contrast  sensitivity  at  the  low  luminance  levels  used 


(Campbell  &  Robson,  1968).  This  alone  could  account  for  the 


superior  detection  performance  observed  for  this  filter  condition. 


More  interesting,  however,  is  the  finding  that  overall  detection  was 
better  for  the  low  frequency  (83%  correct)  than  for  the  unfiltered 
(76%)  imagery  despite  the  fact  that  the  unfiltered  imagery  obviously 
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contains  the  low  frequency  information.  As  indicated  previously, 
this  suggests  a  possible  interference  effect  of  the  higher  frequency 
information  contained  in  the  original,  unfiltered  imagery.  As  in 
the  case  of  recognition,  this  underscores  the  importance  of  the 
overall  pattern  of  spatial  frequencies  for  detection  performance. 

Implications  for  image  processing  and  image  quality  metrics. 
Image  processing  and  enhancement  techniques  are  widely  used  in  Navy 
applications  such  as  reconnaissance,  weather  forcasting,  and  sonar 
imaging.  In  many  such  applications,  a  human  observer  is  required  to 
apply  image  enhancement  algorithms  on  an  interactive  basis  to 
improve  image  quality  for  the  task  at  hand.  Two  aspects  of  this 
problem  deserve  further  comment  in  light  of  the  findings  reported 
here,  the  role  of  the  human  observer  in  interaction,  and  the 
perceptual  basis  for  assessing  image  quality. 

First,  the  present  study  was  only  part  of  a  larger  project  to 
investigate  human-computer  interaction  in  image  processing  and  was 
not  designed  to  examine  fully-interactive  capabilities.  Despite 
this,  however,  a  limited  "interactive"  capability  was  provided  in 
Experiment  2  when  observers  were  required  to  select  which  of  four 
spatial  frequency  filter  conditions  to  observe  on  each  trial — a 
first  step  in  the  investigation  of  fully-interactive  systems.  The 
results  of  this  experiment  revealed  that  some  observers  did  not 
select  the  optimal  spatial  frequency  parameters  for  image 
recognition.  Additional  research  is  called  for  on  the  problem  of 
determining  what  an  observer  knows  of  the  conditions  that  will  lead 
to  optimal  performance.  Previous  research  by  Peterson,  Goppelt,  & 
Grossman  (1984)  has  shown  that  spatial  frequency  filtering  can  lead 
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to  improved  recognition  performance  for  infrared  ship  images.  The 
results  reported  here  suggest  that  observers  may  not  be  able  to  take 
advantage  of  such  image  processing  tools  in  an  interactive  imaging 
environment  without  some  kind  of  additional  decision  aid  or  expert 
system  to  assist  them. 

Second,  image  "quality"  has  received  considerable  attention  in 
the  image  processing  literature  (Snyder,  Shedivy,  &  Maddox,  1982). 
The  objective  of  image  quality  research  is  to  determine  a  scale  or 
metric  of  image  quality  suitable  for  predicting  the  ability  of 
observers  to  extract  information  from  images.  Although  not  designed 
to  investigate  this  problem,  the  findings  of  the  present  study 
suggest  that  singular  measures  of  image  quality  necessarily  fail  to 
capture  the  perceptual  complexity  of  imagery  for  all  tasks.  For 
example,  one  could  argue  that  the  low-pass  imagery  provided  very 
high  quality  for  detection  but  low  quality  for  recognition,  whereas 
the  reverse  was  true  for  recognition.  No  simple  measure  can  provide 
an  accurate  sense  of  image  quality  without  considering  the 
observer's  task.  Additional  work  such  as  that  carried  out  by 
Kuperman  (1985)  is  needed  to  place  image  quality  metrics  on  a  more 
secure  theoretical  footing. 
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